Search CORE

59 research outputs found

Annotation concept synthesis and enrichment analysis: a logic-based approach to the interpretation of high-throughput experiments

Author: Berriz
Costanzo
Freudenberg
Huang
M. Jiline
M. Turcotte
S. Matwin
Sherman
Zhang
Zhou
Publication venue: Oxford University Press
Publication date: 04/10/2011
Field of study

Motivation: Annotation Enrichment Analysis (AEA) is a widely used analytical approach to process data generated by high-throughput genomic and proteomic experiments such as gene expression microarrays. The analysis uncovers and summarizes discriminating background information (e.g. GO annotations) for sets of genes identified by experiments (e.g. a set of differentially expressed genes, a cluster). The discovered information is utilized by human experts to find biological interpretations of the experiments

Crossref

PubMed Central

Multiple aspect trajectories: A case study on fishing vessels in the northern adriatic sea

Author: Matwin S.
Pranovi F.
Raffaeta A.
Rovinelli G.
Russo E.
Silvestri C.
Simeoni M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

In this paper we build, implement and analyze a spatio-temporal database describing the fishing activities in the Northern Adriatic Sea over four years. The database results from the fusion of two complementary data sources: trajectories from fishing vessels (obtained from terrestrial Automatic Identification System, or AIS, data feed) and the corresponding fish catch reports (i.e., the quantity and type of fish caught). We present all the phases of the dataset creation, starting from the raw data and proceeding through data exploration, data cleaning, trajectory reconstruction and semantic enrichment. Moreover, we formalise and compare different techniques to distribute the fish caught by the fishing vessels along their trajectories. We implement the database with MobilityDB, an open source geospatial trajectory data management and analysis platform. Subsequently, guided by our ecological experts, we perform some analyses on the resulting spatio-temporal database, with the goal of mapping the fishing activities on some key species, highlighting all the interesting information and inferring new knowledge that will be useful for fishery management

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Annotation concept synthesis and enrichment analysis: a logic-based approach to the interpretation of high-throughput experiments

Author: Berriz
Costanzo
Freudenberg
Huang
M. Jiline
M. Turcotte
S. Matwin
Sherman
Zhang
Zhou
Publication venue: Oxford University Press
Publication date: 04/10/2011
Field of study

Crossref

PubMed Central

From multiple aspect trajectories to predictive analysis: a case study on fishing vessels in the Northern Adriatic sea

Author: Adibi P.
Bappee F. K.
Brandoli B.
Matwin S.
Pranovi F.
Raffaeta A.
Rovinelli G.
Russo E.
Silvestri C.
Simeoni M.
Soares A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

In this paper we model spatio-temporal data describing the fishing activities in the Northern Adriatic Sea over four years. We build, implement and analyze a database based on the fusion of two complementary data sources: trajectories from fishing vessels (obtained from terrestrial Automatic Identification System, or AIS, data feed) and fish catch reports (i.e., the quantity and type of fish caught) of the main fishing market of the area. We present all the phases of the database creation, starting from the raw data and proceeding through data exploration, data cleaning, trajectory reconstruction and semantic enrichment. We implement the database by using MobilityDB, an open source geospatial trajectory data management and analysis platform. Subsequently, we perform various analyses on the resulting spatio-temporal database, with the goal of mapping the fishing activities on some key species, highlighting all the interesting information and inferring new knowledge that will be useful for fishery management. Furthermore, we investigate the use of machine learning methods for predicting the Catch Per Unit Effort (CPUE), an indicator of the fishing resources exploitation in order to drive specific policy design. A variety of prediction methods, taking as input the data in the database and environmental factors such as sea temperature, waves height and Clorophill-a, are put at work in order to assess their prediction ability in this field. To the best of our knowledge, our work represents the first attempt to integrate fishing ships trajectories derived from AIS data, environmental data and catch data for spatio-temporal prediction of CPUE – a challenging task

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Supporting systematic reviews using LDA-based document representations

Author: AM Cohen
AM Cohen
BC Wallace
BC Wallace
C Counsell
CC Chang
D Demner-Fushman
DM Blei
E Linstead
F Boudin
FR Octaviano
G Maskeri
J García Adeva
K Frantzi
K Henderson
K Romero Felizardo
L Hunter
M Barza
M Fiszman
M Miwa
MW Berry
O Frunza
R Akbani
RA Redner
S Ananiadou
S Arora
S Jonnalagadda
S Kotsiantis
S Matwin
SK Lukins
T Bekhuis
T Bekhuis
T Bekhuis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to drastically decrease the workload involved in the screening phase. The vast majority of these machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW). METHODS: We explore the use of topic modelling methods to derive a more informative representation of studies. We apply Latent Dirichlet allocation (LDA), an unsupervised topic modelling approach, to automatically identify topics in a collection of studies. We then represent each study as a distribution of LDA topics. Additionally, we enrich topics derived using LDA with multi-word terms identified by using an automatic term recognition (ATR) tool. For evaluation purposes, we carry out automatic identification of relevant studies using support vector machine (SVM)-based classifiers that employ both our novel topic-based representation and the BOW representation. RESULTS: Our results show that the SVM classifier is able to identify a greater number of relevant studies when using the LDA representation than the BOW representation. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain. CONCLUSIONS: A topic-based feature representation of documents outperforms the BOW representation when applied to the task of automatic citation screening. The proposed term-enriched topics are more informative and less ambiguous to systematic reviewers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13643-015-0117-0) contains supplementary material, which is available to authorized users

Crossref

Springer - Publisher Connector

Edge Hill University Research Information Repository

PubMed Central

The University of Manchester - Institutional Repository

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

Author: Andrienko G.
Barabasi A. -L.
Boldrini C.
Bonchi F.
Cattuto C.
Chiaromonte F.
Comande G.
Conti M.
Cote M.
Dignum F.
Dignum V.
Domingo-Ferrer J.
Ferragina P.
Giannotti F.
Guidotti R.
Helbing D.
Kaski K.
Kertesz J.
Lehmann S.
Lepri B.
Lukowicz P.
Matwin S.
Megias D.
Monreale A.
Morik K.
Nanni M.
Oliver N.
Passarella A.
Passerini A.
Pedreschi D.
Pentland A.
Pianesi F.
Pratesi F.
Rinzivillo S.
Ruggieri S.
Siebes A.
Torra V.
Trasarti R.
Van Den Hoven J.
Vespignani A.
Publication venue
Publication date: 01/01/2020
Field of study

Institutional Research Information System University of Turin

Recommended from our members

A tree-based decision model to support prediction of the severity of asthma exacerbations in children

Author: A. Abu-Hanna
A. B. Becker
A. M. Kelly
C. W. Hanson
D. A. Windt van der
D. A. Windt van der
D. Faraggi
D. Li
Dympna O’Sullivan
E. J. Crighton
E. Kerem
F. M. Ducharme
G. Holt
I. H. Witten
J. Bishop
J. E. Zimmerman
K. J. Cios
Ken Farion
L. Breiman
L. R. Waitman
M. F. McDermott
M. Grassi
M. H. Gorelick
M. H. Gorelick
M. Smith
N. A. Obuchowski
N. Peek
P. C. Parkin
P. J. Barnett
P. Turney
P. V. Scribano
R. Kohavi
R. N. Shiffman
R. Quinlan
S. D. Pearson
S. E. Rooij de
S. R. Smith
S. Schuh
Stan Matwin
Szymon Wilk
T. Chey
T. Fawcett
V. Podgorelec
Wojtek Michalowski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2010
Field of study

This paper describes the development of a tree-based decision model to predict the severity of pediatric asthma exacerbations in the emergency department (ED) at 2 h following triage. The model was constructed from retrospective patient data abstracted from the ED charts. The original data was preprocessed to eliminate questionable patient records and to normalize values of age-dependent clinical attributes. The model uses attributes routinely collected in the ED and provides predictions even for incomplete observations. Its performance was verified on independent validating data (split-sample validation) where it demonstrated AUC (area under ROC curve) of 0.83, sensitivity of 84%, specificity of 71% and the Brier score of 0.18. The model is intended to supplement an asthma clinical practice guideline, however, it can be also used as a stand-alone decision tool

City Research Online

Crossref

TRAP

Aston Publications Explorer

Feature engineering and a proposed decision-support system for systematic reviewers of medical evidence

Author: A Jimeno-Yepes
AM Cohen
AM Cohen
AM Cohen
BC Wallace
BR Luce
Christian Lovis
Dina Demner-Fushman
DM Blei
Eugene Tseytlin
F Boudin
G Del Fiol
H Bastian
H Kilicoglu
I Mierswa
J Chandler
J Yaffe
KA McKibbon
Kevin J. Mitchell
M Steyvers
MF Porter
NL Wilczynski
NL Wilczynski
NL Wilczynski
O Frunza
Q Zou
QT Zeng
R Klinkenberg
S Matwin
SR Dalal
SS Keerthi
T Bekhuis
T Bekhuis
T Bekhuis
T Hofmann
Tanja Bekhuis
TL Griffiths
X Huang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/01/2014
Field of study

Objectives: Evidence-based medicine depends on the timely synthesis of research findings. An important source of synthesized evidence resides in systematic reviews. However, a bottleneck in review production involves dual screening of citations with titles and abstracts to find eligible studies. For this research, we tested the effect of various kinds of textual information (features) on performance of a machine learning classifier. Based on our findings, we propose an automated system to reduce screeing burden, as well as offer quality assurance. Methods: We built a database of citations from 5 systematic reviews that varied with respect to domain, topic, and sponsor. Consensus judgments regarding eligibility were inferred from published reports. We extracted 5 feature sets from citations: alphabetic, alphanumeric +, indexing, features mapped to concepts in systematic reviews, and topic models. To simulate a two-person team, we divided the data into random halves. We optimized the parameters of a Bayesian classifier, then trained and tested models on alternate data halves. Overall, we conducted 50 independent tests. Results: All tests of summary performance (mean F3) surpassed the corresponding baseline, P<0.0001. The ranks for mean F3, precision, and classification error were statistically different across feature sets averaged over reviews; P-values for Friedman's test were .045, .002, and .002, respectively. Differences in ranks for mean recall were not statistically significant. Alphanumeric+ features were associated with best performance; mean reduction in screening burden for this feature type ranged from 88% to 98% for the second pass through citations and from 38% to 48% overall. Conclusions: A computer-assisted, decision support system based on our methods could substantially reduce the burden of screening citations for systematic review teams and solo reviewers. Additionally, such a system could deliver quality assurance both by confirming concordant decisions and by naming studies associated with discordant decisions for further consideration. © 2014 Bekhuis et al

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

FigShare

Masking Fuzzy-Searchable Public Databases

Author: A Boldyreva
A Boldyreva
A Juels
C Brzuska
G Shakhnarovich
L Brown
M Abadi
M Bellare
M Bellare
M Bellare
M Bellare
M Chase
N Chenette
O Pandey
R Canetti
R Canetti
R Canetti
S Matwin
SP Vadhan
Y Dodis
Y Lindell
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 29/04/2019
Field of study

We introduce and study the notion of keyless fuzzy search (KlFS) which allows to mask a publicly available database in such a way that any third party can retrieve content if and only if it possesses some data that is “close to” the encrypted data – no cryptographic keys are involved. We devise a formal security model that asks a scheme not to leak any information about the data and the queries except for some well-defined leakage function if attackers cannot guess the right query to make. In particular, our definition implies that recovering high entropy data protected with a KlFS scheme is costly. We propose two KlFS schemes: both use locality-sensitive hashes (LSH), cryptographic hashes and symmetric encryption as building blocks. The first scheme is generic and works for abstract plaintext domains. The second scheme is specifically suited for databases of images. To demonstrate the feasibility of our KlFS for images, we implemented and evaluated a prototype system that supports image search by object similarity on a masked database

Crossref

Cryptology ePrint Archive